-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: improve error handling #6524
Conversation
0cda854
to
da3ff97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
src/v/cloud_storage/remote.cc
Outdated
if (auto code = abs(cerr.code().value()); | ||
code != ECONNREFUSED && code != ENETUNREACH && code != ETIMEDOUT | ||
&& code != ECONNRESET && code != EPIPE && code != EBADR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm confused by the commit message. how is this error:
error GnuTLS:-110, The TLS connection was non-properly terminated.
related to handling for standard errno.h codes like ETIMEDOUT? i was under the impression that gnutls error codes were unrelated to standard error codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a function instead? (so expanding it is just adding one element to the array + we can add a unit test.
static_array<int> codes {{ refused, reset, unreach, timedout, badaddr, pipe }}
for (c in code) {
if code == c; return true
}
return false
kinda thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, is nice if it's a func in case at some point we do openssl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right. Good catch.
I was assuming that something was munging gnutls errors into vaguely-semantically-equivalent posix errnos, but in fact they're totally separate.
I guess I need to revise this to do a disgusting string comparison on the system_error
's message to identify gnutls cases, and then compare them with official gnutls error codes from their header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. It's not. I assumed gnutls maps it's error codes to linux error codes, but uses the negative for some reason. That doesn't seem to be the case. I guess we should silence specific gnutls errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcsp perhaps you don't need to parse the string. if these gnutls-errors-as-std::system_error are originating from seastar, then I see that seastar is already attaching a static const gnutls_error_category glts_errorc;
so we can identify which are errno.h and which are gnutls.
This `if` condition was becoming unwieldy.
These are a proxy for underlying connection issues, e.g. - "error GnuTLS:-110, The TLS connection was non-properly terminated." - "error GnuTLS:-53, Error in the push function." Treat them as retryable the same way we do other transient network errors when communicating with an S3 endpoint.
da3ff97
to
201886c
Compare
// The name() of seastar's gnutls_error_category class | ||
constexpr std::string_view gnutls_cateogry_name{"GnuTLS"}; | ||
|
||
if (e.code().category().name() == gnutls_cateogry_name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder why seastar didn't put this in a header so we could check the category without string comparison 😢
This passed tests here: https://buildkite.com/redpanda/redpanda/builds/15766 The |
Cover letter
We have recently seen two error codes from GnuTLS when running against AWS S3:
"error GnuTLS:-110, The TLS connection was non-properly terminated."
"error GnuTLS:-53, Error in the push function."
Examples:
Handle these by adding EBADR for the 53 case, and an abs() around the code to handle negative error codes in general: the 110 code (ETIMEDOUT) was already handled but only when positive.
Backport Required
UX changes
None
Release notes
Improvements